Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BYOC] Handle constants in IRModule-at-a-time external codegen #11770

Merged
merged 1 commit into from
Jun 30, 2022

Conversation

mbs-octoml
Copy link
Contributor

@mbs-octoml mbs-octoml commented Jun 17, 2022

I tried to do to the TensorRT integration what #11631 did to the CUTLASS integration, viz:

  • Make sure all compilation options are passed in Target instances. This helps Collage.
  • Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism.
    This helps use decouple external codegen from lowering.

This PR collects the prep for that change:

  • TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
    visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
    completely separately, and via a callback in TECompiler, the function is visited again in the
    same order and with the same name generation convention by a ConstantUpdater to actually collect the
    bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.

    However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
    hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
    Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
    any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
    constants into the final runtime artifact

    (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)

  • The TensorRT tests use the create_executor interface but it wasn't quite ready for the
    new more general form of passing list-of-targets.

  • I want TensorRT compilation to work out of the box without the need for any special targets if
    all the default options should apply. Go back and make the CUTLASS integration I did follow the
    same convention.

  • To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
    style. This means we can test most of external codegen machinery in one place without depending on
    any target which may not be enabled in CI (eg TensorRT):

    • Target instances are plumbed correctly so compile-time options are available.
    • External modules are conveyed to the final export library.
    • Constant bindings are conveyed to the metadata module.

@mbs-octoml
Copy link
Contributor Author

tests/python/frontend/pytorch/test_forward.py::test_argsort appears to be a flake.

Copy link
Member

@masahi masahi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @Mousius @manupa-arm

  • A minor suggestion to use value_or, rather than supplying a default value and having to do .value(): lowered_mod->GetAttr<Map<String, runtime::NDArray>>(tvm::attr::kConstNameToNDArray)->value_or({}) (not sure if {} thing works)
  • Split inline stuff into other PR.

Copy link
Contributor

@areusch areusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just did a quick review before i ran out the door. main question here: is there a test that actually populates kConstNameToNDArray attr and watches it come out the end of the compiler?

include/tvm/ir/module.h Outdated Show resolved Hide resolved
python/tvm/relay/build_module.py Outdated Show resolved Hide resolved
python/tvm/relay/build_module.py Show resolved Hide resolved
src/relay/backend/aot_executor_codegen.cc Show resolved Hide resolved
src/relay/backend/contrib/codegen_json/codegen_json.h Outdated Show resolved Hide resolved
src/relay/backend/contrib/cutlass/codegen.cc Show resolved Hide resolved
@mbs-octoml
Copy link
Contributor Author

mbs-octoml commented Jun 27, 2022

Thanks @masahi & @areusch, PTAL.

Re value_or: done, thanks, that's much nicer.
Re split PRs, done, see #11923. This was already a split of a split so sorry for the grab bag.
Re end-to-end test: I think this should be done in test_target_hooks.py, but to really exercise everything requires a new fake JSON-based runtime module, a fake custom codegen pass and a new fake external codegen target. I started down that path but it will be quite a lift. Unfortunately however we can't rely on the tensorrt unit test for this (the first user of the new const_name_to_constants attribute) since the functionality won't be testable without running, which is not enabled in CI. WDYT?

@areusch
Copy link
Contributor

areusch commented Jun 28, 2022

i hate to be a stickler, but i feel like something should test this in CI. would such a fake module become useful in other compiler testing?

I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz:
 - Make sure all compilation options are passed in Target instances. This helps Collage.
 - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism.
   This helps use decouple external codegen from lowering.

This PR collects the prep for that change:
 - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
   visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
   completely separately, and via a callback in TECompiler, the function is visited again in the
   same order and with the same name generation convention by a ConstantUpdater to actually collect the
   bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.

   However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
   hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
   Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
   any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
   constants into the final runtime artifact

   (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)

 - The TensorRT tests use the create_executor interface but it wasn't quite ready for the
   new more general form of passing list-of-targets.

 - I want TensorRT compilation to work out of the box without the need for any special targets if
   all the default options should apply. Go back and make the CUTLASS integration I did follow the
   same convention.

 - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
   style. This means we can test most of external codegen machinery in one place without depending on
   any target which may not be enabled in CI (eg TensorRT):
     - Target instances are plumbed correctly so compile-time options are available.
     - External modules are conveyed to the final export library.
     - Constant bindings are conveyed to the metadata module.
@mbs-octoml
Copy link
Contributor Author

mbs-octoml commented Jun 29, 2022

It turns out switching the 'ccompiler' demo external codegen to use target hooks is enough to exercise all this, all be it the PR has grown quite a bit. Let's see what CI makes of it since probably broken something else.

@mbs-octoml
Copy link
Contributor Author

PTAL @areusch

@areusch areusch merged commit 985680e into apache:main Jun 30, 2022
@mbs-octoml mbs-octoml deleted the mbs-prep-for-trt branch June 30, 2022 19:22
blackkker pushed a commit to blackkker/tvm that referenced this pull request Jul 7, 2022
…e#11770)

I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz:
 - Make sure all compilation options are passed in Target instances. This helps Collage.
 - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism.
   This helps use decouple external codegen from lowering.

This PR collects the prep for that change:
 - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
   visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
   completely separately, and via a callback in TECompiler, the function is visited again in the
   same order and with the same name generation convention by a ConstantUpdater to actually collect the
   bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.

   However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
   hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
   Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
   any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
   constants into the final runtime artifact

   (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)

 - The TensorRT tests use the create_executor interface but it wasn't quite ready for the
   new more general form of passing list-of-targets.

 - I want TensorRT compilation to work out of the box without the need for any special targets if
   all the default options should apply. Go back and make the CUTLASS integration I did follow the
   same convention.

 - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
   style. This means we can test most of external codegen machinery in one place without depending on
   any target which may not be enabled in CI (eg TensorRT):
     - Target instances are plumbed correctly so compile-time options are available.
     - External modules are conveyed to the final export library.
     - Constant bindings are conveyed to the metadata module.
masahi pushed a commit to masahi/tvm that referenced this pull request Jul 15, 2022
…e#11770)

I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz:
 - Make sure all compilation options are passed in Target instances. This helps Collage.
 - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism.
   This helps use decouple external codegen from lowering.

This PR collects the prep for that change:
 - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
   visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
   completely separately, and via a callback in TECompiler, the function is visited again in the
   same order and with the same name generation convention by a ConstantUpdater to actually collect the
   bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.

   However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
   hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
   Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
   any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
   constants into the final runtime artifact

   (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)

 - The TensorRT tests use the create_executor interface but it wasn't quite ready for the
   new more general form of passing list-of-targets.

 - I want TensorRT compilation to work out of the box without the need for any special targets if
   all the default options should apply. Go back and make the CUTLASS integration I did follow the
   same convention.

 - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
   style. This means we can test most of external codegen machinery in one place without depending on
   any target which may not be enabled in CI (eg TensorRT):
     - Target instances are plumbed correctly so compile-time options are available.
     - External modules are conveyed to the final export library.
     - Constant bindings are conveyed to the metadata module.
mikeseven pushed a commit to mikeseven/tvm that referenced this pull request Sep 27, 2023
…e#11770)

I tried to do to the TensorRT integration what apache#11631 did to the CUTLASS integration, viz:
 - Make sure all compilation options are passed in Target instances. This helps Collage.
 - Use a custom pass invoked via RelayToTIRTargetHooks instead of the relay.ext.$toolchain mechanism.
   This helps use decouple external codegen from lowering.

This PR collects the prep for that change:
 - TensorRT uses the JSONSerializer visitor to encode each partition function. Previously, when the
   visitor encountered a Constant it simply generated and recorded a name for the constant. Then,
   completely separately, and via a callback in TECompiler, the function is visited again in the
   same order and with the same name generation convention by a ConstantUpdater to actually collect the
   bindings, which are then encoded into a ConstLoaderModule to be made available at runtime.

   However if all TensorRT compilation is to be done by a stand-alone pass there's no TECompiler callback
   hackery available. So I've added a "const_name_to_ndarray" attribute to the IRModule of type
   Map<String, runtime::NDArray> so that named constants can be accumulated throughout compilation by
   any pass which needs to do so. Then the Graph, AOT and VM executors are all updated to merge those
   constants into the final runtime artifact

   (Compare with "Constants", the equivalent attribute for extracting TIR AllocateConsts.)

 - The TensorRT tests use the create_executor interface but it wasn't quite ready for the
   new more general form of passing list-of-targets.

 - I want TensorRT compilation to work out of the box without the need for any special targets if
   all the default options should apply. Go back and make the CUTLASS integration I did follow the
   same convention.

 - To test this I also switched the 'demo' "ccompiler" external codegen target to IRModule-at-a-time
   style. This means we can test most of external codegen machinery in one place without depending on
   any target which may not be enabled in CI (eg TensorRT):
     - Target instances are plumbed correctly so compile-time options are available.
     - External modules are conveyed to the final export library.
     - Constant bindings are conveyed to the metadata module.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants